arXiv:1501.00682v3 [q-bio.NC] 9 Aug 2015 


Quasi-Conscious Multivariate Systems 

Jonathan W. D. Mason, Mathematical Institute, University of Oxford, UK (Submitted to Complexity 2015) 

Abstract 

Conscious experience is awash with underlying relationships. Moreover, for various brain regions such as the visual 
cortex, the system is biased toward some states. Representing this bias using a probability distribution shows that the system 
can define expected quantities. The mathematical theory in the present paper links these facts by using expected float entropy 
(efe), which is a measure of the expected amount of information needed, to specify the state of the system, beyond what is 
already known about the system from relationships that appear as parameters. Under the requirement that the relationship 
parameters minimise efe, the brain defines relationships. It is proposed that when a brain state is interpreted in the context 
of these relationships the brain state acquires meaning in the form of the relational content of the associated experience. For 
a given set, the theory represents relationships using weighted relations which assign continuous weights, from 0 to 1, to 
the elements of the Cartesian product of that set. The relationship parameters include weighted relations on the nodes of the 
system and on their set of states. Examples obtained using Monte-Carlo methods (where relationship parameters are chosen 
uniformly at random) suggest that efe distributions with long left tails are most important. 

1 Introduction 


In the present paper we further develop the theory introduced in the article ‘Consciousness and the structuring property 
of typical data’ (see HI), and demonstrate and investigate the theory through applications in a number of examples using 
computational methods. 

It is intended that the theory will provide a way into the mathematics that underpins how the brain defines the relational 
content of consciousness. Indeed, conscious experience clings to a substrate of underlying relationships; points in a person’s 
field of view can be strongly related (if close together) or unrelated (if far apart), giving geometry; colours can appear similar 
(e.g. red and orange) or completely different (e.g. red and green). We can make a very long list of such examples of relations 
involving different sounds, smells, tastes and locations of touch. Furthermore, at a higher semantic level involving several 
brain regions, if we see someone we know and hear a person’s name then we know whether the name relates to that person. 
It is hard to think of any conscious experience that does not involve relations. Whilst it is difficult to explain how the 
brain defines the colour blue, in the present paper we hope to provide the beginnings of a mathematical theory for how the 
brain defines all of the relations underlying consciousness and, therefore, explain why, for example, blue appears similar to 
turquoise but different to red. It is proposed that when a brain state is interpreted in the context of all these relations, defined 
by the brain, the brain state acquires meaning in the form of the relational content of the experience. If we consider the 
relations defined by the brain to be a type of statistic then we have the following analogy. A single observation of a one 
dimensional random variable is almost meaningless, but in the context of the statistics of the random variable, such as mean 
and variance, the observation has meaning. For arguments in support of this approach, the reader is referred to HI. 

The issue of how a system such as the brain defines relations is crucial. Importantly, for various brain regions such as the 
visual cortex, (under temporally well spaced observations of the system) the probability distribution over the different possible 
states of the system is far from being uniform owing to learning rules of which the Bienenstock, Cooper and Munro (BCM) 
version of Hebbian theory is one candidate; see fJl, |0 and ||4|. Hence, the brain is not merely driven by the current sensory 
input, but is biased toward certain states as a result of a long history of sensory inputs. The probability distribution over the 
states of the system is therefore a property of the system itself allowing the system to define expected quantities. 
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In the theory presented in the present paper, the brain defines relations under the requirement that the expected quantity of 
a particular type of entropy is minimised. We call this entropy entropy. For a collection of relations on the system and 
any given state of the system, the float entropy of the state is a measure of the amount of information required, in addition 
to the information given by the relations, in order to specify that state. We make the definition of float entropy precise 
in Subsection ll.il However, later in the present paper we will give a more general definition (multi-relational float entropy) 
which allows the involvement of more than two relations; see Subsection l4.ll We will also consider a time dependent version, 
and the theory of the present paper will be compared with Integrated Information Theory and Shannon entropy. 

1.1 Definitions 

In this subsection we provide the main definitions in the present paper. Systems such as the brain, and its various regions, are 
networks of interacting nodes. In the case of the brain we may take the nodes of the system to be the individual neurons or 
possibly larger structures such as cortical columns. The nodes of the system have a repertoire (range) of states that they can 
be in. For example, the states that neurons can be in could be associated with different firing frequencies. In the present paper 
we assume that the node repertoire is finite (as was assumed in HI), and the state of the system is the aggregate of the states 
of the nodes. 

The original theory in HI used a mainly set theoretic approach, where a relation on a nonempty set S was usually taken to 
be a binary relation R c 52 . Weighted relations (see below) are slightly more general than binary relations, and the further 
development (presented in the present paper) of the original theory uses weighted relations because they allow a system to 
define a weighted relation on the repertoire of its nodes. This is desirable as we will see from examples later in the paper. 

In Definition ! 1. li the elements of the set S are to be taken as the nodes of the system. 

Definition 1.1. Let S be a nonempty finite set, n #S. Then a data element/or S is a set (having a unique arbitrary index 
label i) 

Si := {{a,fi(a)): a € S, : S ^V}, where ft is a map 

and V := {vi, V 2 , ..., Vm} is the node repertoire. The set of all data elements for S given V is Llsy so that #Llsy = m". 
For temporally well spaced observations, it is assumed that a given finite system defines a random variable with probability 
distribution P : Llsy [0,1] for some finite set S and node repertoire V. If T is a finite set of numbered observations of the 
system then T is called the typical data/or S. The elements ofT (called typical data elements) are handled using a function 

T : {1,... ,#r} —>■ {i: Si € 

where is the value of observation number kfor k S {1,... ,#T }. In particular, the function T need not be injective since 
small systems may be in the same state for several observations. 
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Remark 1.1. Note that P in Deftnition \l.l\ extends to a probability measure on the power set ofPlsy by defining 


P(A):= Y^PiSi). forA€2^^y. 

SyA 

Hence, we have a probability space {Q.sy,2^^’'^ ,P) with sample space Pisy> sigma-algebra 2^^’'', and probability measure P. 
We now need the definition of a weighted relation. 

Definition 1.2 (Weighted relations). Let S be a nonempty set. A weighted relation on S is a function of the form 


where [0,1] is the unit interval. We say that R is: 

1. reflexive ifR{a,a) = I for all a G S; 

2. symmetric ifR{a,b) = R{b,a)for all a,b G S. 

The set of all reflexive, symmetric weighted-relations on S is denoted Ws. 

Remark 1.2. Except where stated, the weighted relations used in the present paper are reflexive and symmetric. Relative 
to such a weighted relation, the value R{a,b) quantifies the strength of the relationship between a and b, interpreted in 
accordance with the usual order structure on [0,1] so that R{a,b) = I is a maximum. For a small finite set, it is useful to 
display a weighted relation on that set as a weighted relation table (i.e. as a matrix). 

Before giving the definition of float entropy we require Definitions II. 3l and l 1.41 

Definition 1.3. Let S be as in Definition \Ll\ and let U :V^ ^ [0, 1 ] be a reflexive, symmetric weighted-relation on the node 
repertoire V; i.e. U G ‘Bv- Then, for each data element Si G Lis,v> define a function R{U,Si} : —t [0,1] by setting 

R{U,Si}{a,b) :=U{fi{a)Ji{b)) for all a,bGS. 


It is easy to see that R{U,Si\ G 

Definition 1.4. Let S be a nonempty finite set. Every weighted relation on S can be viewed as a #S^-dimensional real vector. 
Hence, the d„ metric is a metric on the set of all such weighted relations by setting 

An{RX)-=( E \R{a,b)-R'{a,b)A' , 

^ (a,b)eS'^ 2 

where R and R' are any two weighted relations on S. Similarly we have the metric doo{R,R') := max 52 \R{a,b) — Rfa,b)\. 
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Definition 1.5 (Float entropy). Let S be as in Deftnition M.h let U S ‘Fy, and let R G 'F 5 . The float entropy of a data element 
Si G relative to U and R, is defined as 


HR,U,Si) :=log2(#{5; G£l 5 ,y; d[R,R{U,Sj}) <A[R,R{U,Si})}), 


where, in the present paper (unless otherwise stated), d is the dj metric. Furthermore, let P : [0,1] and T be as in 

Definition \l.l\ The expected float entropy, relative to U and R, is defined as 

efe(/?,t/,P):= ^ P(5,)fe(/?,t/,5,). 

Si&Sls,v 

The efe{R,U,T) approximation of efe{R,U,P) is defined as 

Y #T 

efe(/?,t/,r):= —^fe(/?,t/,5,(,)), 

#T 

where T need not be injective by Definition \l.l\ By construction, efe is measured in bits per data element (bpe). 

It is proposed that a system (such as the brain and its subregions) will define U and R (up to a certain resolution) under 
the requirement that the efe is minimised. Hence, for a given system (i.e. for a fixed P), we attempt to find solutions in U and 
R to the equation 


efe(R,U,P)= min efe(R',U',P). (1) 

R'e'¥s,u'e'¥v 

In practice we replace efe (•, •, F) in ([T]l with efe (•, •, F). 

Remark 1.3. In Definition U.5\ the di metric is used. It turns out that, amongst many metrics, a change in metric has only a 
small effect on the solutions to (|7}. There are also plenty of pathological metrics which, when used, will significantly change 
the solutions to dZD- In Remark U .2\ we mentioned that, fora weighted relation, the value ofR{a,b) is interpreted in accordance 
with the usual order structure on [0,1]. We argue that the order structure to be used on [0,1] should be determined by the 
metric that is being used in Definition U .5\ Hence, for a pathological metric, whilst the solutions to ([7]) will have changed, 
their interpretation as weighted relations may be largely unchanged when the order structure used on [0,1] is determined 
by the metric being used (when this makes sense). In practice, we want to use the usual order structure on [0,1], and this 
requirement limits which metrics should be used in Definition O We will look at the issue of metrics in some detail in 
Subsection \3.3\ 

Remark 1.4. The theory presented in the present paper uses the definitions in Subsection U.U Suppose we restricted these 
definitions so that the only weighted relation we could use on the node repertoire V was the Kronecker delta, and the only 
elements ofWg we could use were weighted relations taking values in the two point set {0,1}. Then, under these restrictions. 
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Definition U.5\ would yield a definition offioat entropy equivalent to that given in 43?. Indeed, note that a weighted relation 
R : ^ {0,1} is given by the indicator function for the relation {{a,b) gS^: R{a,b) = 1} G S^. Hence, the theory presented 

in the present paper is indeed a development of the theory presented in 43?. 

Remark 1.5. With reference to Remark\LJ\and Definition U .5\ for A G we have the weak conditional efe 

efe(i;,t/,P|A):= ^ P(5,-| A)fe(i;,t/,5,). 

Si&ils,v 

Weak conditional efe can be useful when considering a system that has entered a particular mode such that this mode restricts 
the system to a particular set of data elements. There may be other useful definitions of conditional efe. 

1.2 Advantages of the theory and overview 

The examples in the present paper are intended to have relevance to the visual cortex and our experience of monocular vision. 
In lieu of typical data for the visual cortex we apply the theory to typical data for digital photographs of the world around us. 
If the theory, as used in the examples, is relevant to the visual cortex then the examples show that the perceived relationships 
between different colours, the perceived relationships between different brightnesses, and the perceived relationships between 
different points in a person’s field of view (giving geometry) are all defined by the brain in a mutually dependent way. 
Hence, in this case, there is a connection between the relationships that underly colour perception and our perception of the 
underlying geometry of the world around us. Of course the states of the visual cortex are somewhat more complicated than 
digital photographs since some neurons have sophisticated receptive fields. However, the theory presented in the present 
paper does not assume that the nodes of the visual cortex have to be individual neurons. Instead, each node can consist of 
many neurons; effectively representing the data elements using a larger base (note that we can think of the node repertoire 
as being analogous to a choice of base in the representation of integers). Hence, the examples could well be relevant to the 
visual cortex. A preliminary discussion and investigation regarding base is presented in Subsection 13. II 
We also apply the theory to a system where the probability distribution P in Definition ! l.ll is uniform over Tisy- In this case 
the solutions to ([T]i vary greatly (instead of all being similar) and, hence, the system fails to define weighted relations that 
give a coherent interpretation of the states of the system. The variation in the solutions to O is partly due to symmetries, and 
this is discussed in Examr)le l3.4l 

It is argued in mi that the theory presented there provides a solution to the binding problem and avoids the homunculus 
fallacy. Those arguments also apply to the theory presented in the present paper. In particular, consciousness is not the output 
of some algorithmic process but it may instead, largely, be the states of the system interpreted in the context of the weighted 
relations that minimise expected float entropy, where here we are talking about a definition of float entropy that involves more 
than the two weighted relations used in ([T]l; see Subsection l4.ll This argument may become clearer for the reader after going 
through the examples in the present paper. The rest of the paper is organised as follows. Section|2]looks at obtaining typical 
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data from digital photographs, and specifies the computational methods used for finding solutions to ([T]i. Section[3]provides 
six examples in which the theory is applied. We continue the development of the theory by looking at changing the base of 
a system, joining and partitioning systems, and metric independence. Section|4]provides generalisations of Definition [T3] a 
comparison between the present theory and both Giulio Tononi’s Integrated Information Theory (IIT) and Shannon entropy, 
followed by the conclusion. AppendixlAl lists the software used, and Appendix iBlprovides a list of notation. 

2 Typical data and computational methods 

In this section we look at obtaining typical data from digital photographs, a binary search algorithm for finding solutions 
to ([T]l, and using efe-histograms to assess guesses when guessing solutions to O- 

2.1 Typical data from digital photographs 

When obtaining a typical data element from a digital photograph, in the present paper, only a small part of the photograph 
is used. This is because the computational methods used in the present paper are suitable for small systems {#Q.s.v < 10®) 
although, at the expense of clarity and ease of implementation, other more efficient computational methods are possible for 
investigating larger systems; see Appendix lAl which lists the software used during the research for the present paper and 
provides a discussion on more efficient computational methods. 

Figure [T] shows the sampling of a digital photograph such that the typical data element obtained is for a system comprised 
of five nodes with a four state node repertoire {#Q.s.v = 1024). Also, in the case of Figure[T] we are using pixel brightness to 




Figure 1: Digital photograph sampling using five nodes and a four shade gray scale. 
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Figure 2; Digital photograph sampling using five nodes and a nine colour red/green palette. 


determine node state. From top-left to bottom-right, the first image is the original. This image is desaturated (the colours are 
turned into shades of gray) and then the contrast is enhanced. The contrast enhancement is not required, but it was thought 
that it might reduce the number of typical data element needed in order to obtain meaningful results. Indeed, when similar, 
the solutions to ([T]i are rather like a type of statistic and, therefore, when using typical data we need to make sure that the 
sample size is large enough. The image is then posterised (in this case the number of shades is reduced to four giving a four 
state node repertoire). Finally, five pixels are sampled giving the state of each of the five nodes of the typical data element; 
see Table[T] To obtain the typical data for the system, this way of obtaining typical data elements is used for several hundred 
Table 1: Node states of the typical data element obtained from the sampling in Figure [T] 



node 1 

node 2 

node 3 

node 4 

node 5 

‘St(i) 

0.000 

147.224 

441.673 

441.673 

147.224 


digital photographs. Importantly, what ever the geometric layout of the pixel sampling locations (in Figure [T] the layout is 
part of a grid that has adjacent locations every ten pixels), the same layout must be used for all of the digital photographs. 
Similarly, the same criteria must be used for determining the node states. 

The sampling in Figure|2]obtains a typical data element for a system comprised of five nodes with a nine state node repertoire 
(#i2s V = 59049). Here node state is determined by pixel colour over a red/green palette. From top-left to bottom-right, we 
first have the original image to which colour contrast enhancement is applied. The image is then restricted to colours made up 
of red and green by setting blue values to zero. The image is then posterised (three values for red and three values for green 
are used giving a nine state node repertoire). Finally, five pixels are sampled; the result is given in Table|2l We now consider 
computational methods for finding solutions to O- 
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Table 2: Node states of the typical data element obtained from the sampling in Figure|2] 



node 1 

node 2 

node 3 

node 4 

node 5 

‘St(i) 

128,128 

255,255 

255,255 

128,128 

128,128 


2.2 Binary search algorithm 

For any given system, let n — #S and m = #V. 

Step 1. The initial approximation of a solution to ([T]) is taken to be the pair U G 'Fy and RG'i's with U (v, v') = j and R{a,b) = j 
for all v,v' G F, V ^ v', and a,b € S, a ^ b, respectively. 


Step 2. For U and R (shown in Table|3ll a given approximate solution to ([T]i, let k = 2 where q — min{/ G N: 2 'mi. 2 G N}. 
We now calculate the efe value of the system for each combination of the entries in Table |4] that give symmetric 

Table 3; Approximate solution to Equation ([T]i. 


U 

Vl 

V2 

V3 

R 

node 1 

node 2 

node 3 

VI 

1 

“1,2 

“1,3 

node 1 

1 

n,2 

'■ 1,3 

V2 

“2,1 

1 

“2,3 

node 2 

'■2,1 

1 

'■ 2,3 

V3 

“ 3,1 

“ 3,2 

1 

node 3 

'■ 3,1 

'■ 3,2 

1 


weighted relations. This is a binary search in the sense that there are two options per entry. 

Table 4; Binary entries over which to search for approximate solutions to O- 


U 

VI 

V2 

V 3 

R 

node 1 

node 2 

node 3 

Vl 

1 

Ui 2 

M| 3 i ^ 

node 1 

1 

ri, 2 ±k 

ri_3±k ••• 

V 2 

M 2 . 1 i k 

1 

“ 2,3 dzk 

node 2 

r 2 ,i ±k 

1 

^ 2,3 dzk 

V3 

M3J i k 

M3_2 

1 

node 3 

'■ 3,1 

r3,2 

1 


Step 3. If the minimum of the efe values, obtained in Step 2, was given by only one of the pairs of weighted relations tested in 
Step 2 then redefine U and R as this new pair of weighted relations and return to Step 2. Otherwise, output U, R and 
their associated efe value, and stop. 

If the algorithm did not stop then the chronology of approximate solutions, given by the applications of Step 3, would be a 
convergent sequence with respect to d\ and any of the metrics in Dehnition ll.4l However, for m >2 and n >2, both 'Fy 
and ‘Fj are uncountable inhnite sets; whereas the number of possible efe values is hnite. Hence, some efe values result from 
inhnitely many weighted relations in 'Fy and ‘Fs. It is not surprising then that, as the approximate solutions become closer 
with respect to lii, ultimately the algorithm stops at Step 3. In short, the system dehnes U and R (up to a certain resolution) 
under the requirement that the efe is minimised. 

This search algorithm works well; see its use in Section [3 However, the number of efe values calculated during each 


8 













application of Step 2 is 2(”(" !)+'«('« i))/2 por example, a system with #S = #V = 5 can result in the algorithm calculating 
more than 10^ efe values before stopping. Hence, the present paper also uses the following, computationally less expensive, 
method for approximating solutions to O; also see Appendixconcerning more efficient computational methods. 

2.3 Using efe-histograms obtained from Monte-Carlo methods 

Here we choose U € ‘Py and R G uniformly at random. With reference to Table [3] this is done by choosing each off- 
diagonal upper-triangular entry of U and R uniformly at random from the interval [0,1] (the off-diagonal lower-triangular 
entries are then those making U and R symmetric). The efe value is then calculated and stored, and the whole process is 
repeated producing a list of many thousands of efe observations from which an efe-histogram can be obtained. With this 
setup, if we wish to treat efe as a random variable then standard methods can be used for approximating the probability 
distribution from the efe values (although this can be difficult for distributions with very thin tails). In any case, provided 
enough observations are made, the efe-histogram can be used to help assess guesses when guessing approximate solutions 

to ([T]i- 

However, we need to be careful concerning what is meant by ‘choose uniformly at random from the interval [0,1]’. Usually, 
this means that all subintervals of the same length are equally probable events. This is fine for us as long as the length of 
subintervals is determined by the metric used in Definition 11.51 which conveniently is di; see Subsection 13.31 for relevant 
details. 

We are now ready to apply the theory. 

3 Examples and investigations 

This section provides insight concerning how the theory performs in practice by way of several informative examples and 
investigations. 

Example 3.1. In this example 200 digital photographs of the world around us are used. The typical data is obtained using 
exactly the method shown in Figure\I\ where the photographs have a four shade gray scale. Hence, #T = 200 and the system 
is comprised of five nodes with a four state node repertoire {#D,sy = 1024). The binary search algorithm of Subsection \2.2\ 
was applied to T and, after ten cycles, returned the weighted relations in Table\^ Figure\^provides a graph illustration of 
the weighted relations. For U, values above 0.2 are indicated with a solid line, whilst values from 0.02 to 0.2 are indicated 
with a dash line. For R, values above 0.9 are indicated with a solid line, whilst values from 0.75 to 0.9 are indicated with 
a dash line. Although #T = 200 is rather small, T has defined the correct relationships under the requirement that efe is 
minimised. Ai described in Subsection 12.31 Figure^provides an efe-histogram for T. For U and R in Table\^we have 
efe{R,U,T) = 4.91623, to six sf and this value is indicated in Figure^by the triangular marker furthest to the left. The 
efe-histogram is negatively skewed with a long left tail and this shape is usual for systems where the probability distribution P, 
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Table 5; Approximate solution for Example l3.ll 


u 

0 

147.224 

294.449 

441.673 

0 

1 

0.30908203125 

0.05224609375 

0.00439453125 

147.224 

0.30908203125 

1 

0.41064453125 

0.10400390625 

294.449 

0.05224609375 

0.41064453125 

1 

0.34228515625 

441.673 

0.00439453125 

0.10400390625 

0.34228515625 

1 


R 

node 1 

node 2 

node 3 

node 4 

node 5 

node 

1 

1 

0.99853515625 

0.62353515625 

0.92041015625 

0.78369140625 

node 

2 

0.99853515625 

1 

0.94580078125 

0.75244140625 

0.93505859375 

node 

3 

0.62353515625 

0.94580078125 

1 

0.73486328125 

0.88330078125 

node 

4 

0.92041015625 

0.75244140625 

0.73486328125 

1 

0.98193359375 

node 

5 

0.78369140625 

0.93505859375 

0.88330078125 

0.98193359375 

1 





1 

2 

3 



Figure 3: Graph illustration of the weighted relations in Table|5] showing strongest relationships 
{solid lines) and intermediate relationships (dash lines). 



Figure 4; An efe-histogram for Example [34] using 200,000 observations and a bin interval of 
0.01. For each cycle of the binary search algorithm, the efe value of the approximate solution 
obtained is shown {triangular marker). 


in Definition U. 1\ is far from uniform overQ,s,v- 

Example 13.21 involves a larger system than that of Example 13. II Here enlarging the system results in an increase in the 
difference between the minimum efe and the location (mean or median) of the efe-histogram. 

Example 3.2. In this example 400 digital photographs of the world around us are used. The typical data is obtained using 
the method shown in Figure\I\ except the number of sampling locations is increased from five to nine to form a three by three 
grid. Since, #T = 400 and #Tis.v = 4^ = 262144, this system is too large to apply the binary search algorithm. Instead, 
Table\^in Example \3.1\ was used to guess an approximate solution. Figure\^provides an tfQ-histogram for T. The efe value 
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Figure 5; An efe-histogram for Example 13. 2l using 2000 observations and a bin interval of 0.05. 
The efe value of the approximate solution is shown {triangular marker). 


for the approximate solution is indicated with a triangular marker and shows that the guess is favorable. 

In the next two examples the theory is applied to systems where the probability distribution P, in Definition ll.il is uniform 
over Pls.v- These two examples can be compared with Example l3.ll 

Example 3.3. In this example #T = 200 and the system is comprised of five nodes with a four state node repertoire (#Qs.v = 
1024), as is the case in Example U.h However, in the present example, the elements ofT are chosen uniformly at random from 
^S.v- Figure^provides an efe-histogram for T. The binary search algorithm was also applied to T and completed thirteen 
cycles. The efe-histogram is not negatively skewed and the difference between the efe value of the approximate solution, found 
by the binary search algorithm, and the mean of the efe-histogram is only 0.62, to three sf, compared to 4.26 for ExamDle \3.1\ 
A second choice for the elements ofT was then made uniformly at random from Tlsy. The approximate solution, found by 
the binary search algorithm, for the second choice ofT was very different to that of the first choice ofT. 



Eigure 6: An efe-histogram for Example l3.3l using 200,000 observations and a bin interval of 
0.01. Eor each cycle of the binary search algorithm, the efe value of the approximate solution 
obtained is shown {triangular marker). 
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Example 3.4. In this example the system is again comprised of five nodes with a four state node repertoire. However, 
#T — 1024 such that there is exactly one observation of each element of Q-sy in T. In this case if we take the probability 
distribution P, in Deftnition \l.l\ to be uniform over iljy then efe(-,-,r) = efe(-,-,P); see Definition \1.5\ In particular, if 
we let T' denote the typical data in Example 13.51 then the present example is the limit case for Example 15.51 as #T' —^ oo_ 
Figure^provides an eft-histogram for T. The binary search algorithm was applied to T but stopped before completing one 
cycle because the minimum of the efe values, obtained in Step 2 of the algorithm, was given by many of the pairs of weighted 
relations tested in Step 2. This is due to a type of symmetry within T which we now consider. 

We can represent T in the form of a table with each row corresponding to a typical data element; e.g. see Tables\I\and^ 
A transformation of T can be made by, for example, switching the content of columns 3 and 4 of T, which is equivalent to 
switching round the node labels at the top of the columns. Table^presents one of the pairs of weighted relations that gave 
the minimum efe value obtained in Step 2 of the algorithm. A transformation of R in Table^can be made by switching the 
content of columns 3 and 4, and then switching the content of rows 3 and 4. Clearly, the efe is invariant under performing 
both the transformation to T and the transformation to R. Now, because T is comprised of exactly one observation of each 
element of Tlg y, the rows of the transformed version of T can be reordered to give back T before the transformation was 
made. Since efe is invariant regarding the order of typical data elements, the efe value given by T relative to U and the 
transformed version of R is the same as efe{R,U,T). Since R and its transformed version are different, the minimum of the 
efe values, obtained in Step 2 of the algorithm, is given by more than one of the pairs of weighted relations tested in Step 2. 
The same argument also applies to the solutions to (O and, consequently, these solutions vary greatly with respect to di. 

Also in the present example, for every fixed U and R, the transformation on T is a type o/efe preserving involution (i.e. T 
has a type of symmetry). More generally beyond the present example, the extent to which T is invariant, up to the order of 



Figure 7: An efe-histogram for Examr)le l3.4l using 200,000 observations and a bin interval of 

0 . 0002 . 
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Table 6; One of the pairs of weighted relations, in Example l3.4l that gave the minimum efe 
value obtained in Step 2 of the binary search algorithm. 


u 

Vl 

V2 

V3 

V4 

VI 

1 

0.75 

0.75 

0.25 

V2 

0.75 

1 

0.25 

0.25 

V3 

0.75 

0.25 

1 

0.25 

V4 

0.25 

0.25 

0.25 

1 


R 

node 1 

node 2 

node 3 

node 4 

node 5 

node 1 

1 

0.25 

0.25 

0.75 

0.25 

node 2 

0.25 

1 

0.75 

0.25 

0.25 

node 3 

0.25 

0.75 

1 

0.25 

0.25 

node 4 

0.75 

0.25 

0.25 

1 

0.25 

node 5 

0.25 

0.25 

0.25 

0.25 

1 


its rows following such transformations, may be important regarding the shape of the eft-histogram. 

Remark 3.1. Note that, in Example \3.4\ the involution on T can be put into a broader context as an element of a group of 
permutations of the contents of the columns ofT. Similarly, the transformation applied to R is an element of a group of such 
transformations on ‘Tj . There is also a similar group of transformations on ‘Tv- Beyond Example \3.4\ for a given system it 
may be that such a transformation on 'Tj acts almost as the identity on the solutions to (0. In this case the system has defined 
geometry on S, under the requirement that the efe is minimised, that has a symmetry such as a rotation or reflection etc. 

Upon consideration of the positively skewed efe-histogram in Figure^ the reader might ask why we do not look for pairs 
of weighted relations that maximise efe instead of minimise it. For every given system, the weighted relations U G ‘Ty and 
R G ‘Tj that maximise efe are the constant functions which everywhere take the value 1; see Deftnition \1.5\ 


In the next example the typical data is obtained from colour digital photographs. 


Example 3.5. In this example 600 digital photographs of the world around us are used. The typical data is obtained using 
exactly the method shown in Figure \2\ where the photographs have a nine colour red/green palette. Hence, #T — 600 and 
the system is comprised of five nodes with a nine state node repertoire (#Qs,v = 59049). The system is too large to apply the 
binary search algorithm. Hence, in this case, approximate solutions to 0 are guessed and their associated efe values are 
compared with an eft-histogram for the system. Table^presents the guess for R; the right hand side of Figure\^provides 
a graph illustration for R. Table^gives two different guesses, U' and U, for the weighted relation on V (note that the node 
repertoire labels are of the form red,green i.e. 255,0 is the label for pure red). Figure^provides a graph illustration of 

Table 7; Guess for R in Example 13. 5 1 


R 

node 1 

node 2 

node 3 

node 4 

node 5 

node 1 

1 

0.95 

0.65 

0.95 

0.75 

node 2 

0.95 

1 

0.95 

0.75 

0.95 

node 3 

0.65 

0.95 

1 

0.60 

0.75 

node 4 

0.95 

0.75 

0.60 

1 

0.95 

node 5 

0.75 

0.95 

0.75 

0.95 

1 


the weighted relations in Tabled Figure^provides an tft-histogram for T and, whilst U' is an obvious first guess, it is 
U that gives the lower efe value. With respect to U, elements ofV of the form x,x-\-a, where a is constant, are more 
strongly related than elements of the form x,a —x; i.e. with respect to U, the representative of pure red is very distinct from 
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Table 8: Guesses for the weighted relation on V in Examr)le l3.5l 


u' 

0,0 

128,0 

255,0 

0,128 

128,128 

255,128 

0,255 

128,255 

255,255 

0,0 

1 

0.45 

0.15 

0.45 

0.25 

0.10 

0.15 

0.10 

0.05 

128,0 

0.45 

1 

0.45 

0.25 

0.45 

0.25 

0.10 

0.15 

0.10 

255,0 

0.15 

0.45 

1 

0.10 

0.25 

0.45 

0.05 

0.10 

0.15 

0,128 

0.45 

0.25 

0.10 

1 

0.45 

0.15 

0.45 

0.25 

0.10 

128,128 

0.25 

0.45 

0.25 

0.45 

1 

0.45 

0.25 

0.45 

0.25 

255,128 

0.10 

0.25 

0.45 

0.15 

0.45 

1 

0.10 

0.25 

0.45 

0,255 

0.15 

0.10 

0.05 

0.45 

0.25 

0.10 

1 

0.45 

0.15 

128,255 

0.10 

0.15 

0.10 

0.25 

0.45 

0.25 

0.45 

1 

0.45 

255,255 

0.05 

0.10 

0.15 

0.10 

0.25 

0.45 

0.15 

0.45 

1 

U 

0,0 

128,0 

255,0 

0,128 

128,128 

255,128 

0,255 

128,255 

255,255 

0,0 

1 

0.11 

0.04 

0.11 

0.45 

0.09 

0.04 

0.09 

0.16 

128,0 

0.11 

1 

0.11 

0.04 

0.11 

0.45 

0.02 

0.04 

0.09 

255,0 

0.04 

0.11 

1 

0.02 

0.04 

0.11 

0.015 

0.02 

0.04 

0,128 

0.11 

0.04 

0.02 

1 

0.11 

0.04 

0.11 

0.45 

0.09 

128,128 

0.45 

0.11 

0.04 

0.11 

1 

0.11 

0.04 

0.11 

0.45 

255,128 

0.09 

0.45 

0.11 

0.04 

0.11 

1 

0.02 

0.04 

0.11 

0,255 

0.04 

0.02 

0.015 

0.11 

0.04 

0.02 

1 

0.11 

0.04 

128,255 

0.09 

0.04 

0.02 

0.45 

0.11 

0.04 

0.11 

1 

0.11 

255,255 

0.16 

0.09 

0.04 

0.09 

0.45 

0.11 

0.04 

0.11 

1 



Figure 8; Graph illustration of the weighted relations in Table[8] showing strongest relationships (solid lines), 
intermediate relationships (dash lines) and, for U only, weak intermediate relationships (dotted lines). 



Figure 9; An efe-histogram for Fxample l3.5l using 5000 observations and a bin interval of 0.05. The values 
efe{R,U',T) = 7.97695 and efe(R,U,T) = 7.28947, to six sf, are shown (triangular marker). 
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the representative of pure green. We note that, given the efe-histogram and the system, R and U appear to be favorable 
and appropriate weighted relations. However, R and U are still only guesses and actual solutions to 0 could be somewhat 
different. Ideally we would use the binary search algorithm on a similar system but with a larger node repertoire and a larger 
typical data, but this comes at a high computational expense. 

We now have the first of three investigations concerning the theory. 


3.1 Changing the base of a system 

In this subsection we look at base changing operations, base branching structure, and the affect of changing base on efe- 
histograms. 


3.1.1 Base changing operations 

Here we look at two different types of base changing operations. One of the types of operations involves combining nodes 
whilst the other involves splitting nodes. Many operations of the same type are equivalent in the sense that the resulting 
systems only differ in the choice of labels used for nodes or repertoire elements. Furthermore, every combining operation is 
the inverse of some splitting operation and vice versa. As an example, suppose we have a system with #S — 6 and #V = 2. 
Table|9]shows one of the ways to combine the nodes so that the resulting system has #S' = 3 and #V' = 4. More generally, from 

Table 9; Example of changing the base of a system. 



node 1 

node 2 

node 3 

node 4 

node 5 

node 6 


V2 

V2 

VI 

Vl 

V2 

Vl 



node 1 

node 2 

node 3 

Node allocation 

(node l,node 4) 

(node 5,node 2) 

(node 6,node 3) 

c' 

(V2,vi) 

(V2,V2) 

(vi,vi) 


v'l v'3 

Repertoire allocation (V 2 ,V 2 ) (v 2 ,vi) (vi,vi) (vi,V 2 ) 


node 1 node 2 node 3 


Si 


T(l) 


Table |9] we see that there are 6! different possible node allocations and 4! different possible repertoire allocations. Hence, 
in this case, the total number of such combining operations (or splitting operations if reversing the process) is 6!4! = 17280, 
and this corresponds to the fact that #Q.sy =2^ = 4^ = #Q.gi yi. Similarly we note that #Q.sy = 2® = 8^ so that there are 6!8! 
different combining operations resulting in systems with two nodes and an eight state node repertoire. 

Such operations do have an affect on efe-histograms. Indeed, since #Q.sy =2^ — 64^ we can apply a combining operation 
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that results in a system with one node and a node repertoire that has a state for every state of the system. The resulting 
efe-histogram has a standard deviation of zero and is located at the maximum possible efe value for a system with 64 states, 
which is log2(64) = 6; see Definition ! 1.51 and Subsection l2.3l We will further consider the affect of base changing operations 
on efe-histograms in Subsection l3.1.31 


3.1.2 Base branching structure 

We have already noted in Subsection 13.1.1! that many base changing operations are equivalent in the sense that the resulting 
systems only differ in the choice of labels used for nodes or repertoire elements. The advantage of this redundancy, for 
appropriate systems, is that it allows us to apply a splitting operation in the first instance (i.e. we can start with a repertoire 
allocation) instead of being restricted to combining operations. Alternatively we can avoid this redundancy by treating a 
system in its initial base as being at the bottom of a branching structure which branches under combining nodes such that 
each branch terminates with the system represented by a single node. Table fTOl shows one such branch. We note that, with 
regard to weighted relations on the nodes, the order of the columns in TablefTOlis not important as long as column heading and 
column contents are kept together. Furthermore, there is no repertoire allocation since we retain the vector form of the node 
states. These simplifications reduce the number of combining operations discussed in Subsection l3.1.1l from 17280 to 120. 

Table 10: One branch of a base branching structure. 



node 1 

node 2 

node 3 

node 4 

node 5 

node 6 


V2 

V2 

Vl 

Vl 

V2 

Vl 


Branch 

(node l,node 4) 

(node 5,node 2) 

(node 6,node 3) 

^T(l) 

(V2,Vl) 

(V2,V2) 

(vi,vi) 


End of Branch 

((node 5,node 2),(node l,node 4),(node 6,node 3)) 

s" 

((v2,V2),(V2,Vi),(vi,Vi)) 


Now the definition of float entropy in Definition [T3] uses only one base for a system. However, multi-relational float entropy 
(see Subsection 14. Il l involves more weighted relations by involving more than one base. For some systems it may be that 
particular bases are important regarding weighted relations that minimise efe and/or regarding maximising the length of the 
left tail of the efe-histogram. Indeed, we have already noted that combining all of the nodes of a system into a single node is 
not good in this respect, showing that other bases are preferable; see Subsection l3.1.1l Moreover, a change of base may allow 
a system to define weighted relations at a higher level of meaning. For example, the solutions to ([T]l may define (to a high 
resolution) a weighted relation R on the nodes of some system, giving two dimensional geometry. For a particular branch of 
the base branching structure, the states of the composite nodes will be images under the geometry (given by R) on the nodes 
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that have been combined. Hence, under the requirement of further minimising efe, the system defines a weighted relation on 
the repertoire of the composite nodes and hence on a set of images; see Subsection 14. II This may have relevance to some 
aspects of the Gestalt theory of visual perception; see a. 

Comparing base changing operations with base branching structure we note that allowing arbitrary application of successive 
combining and splitting operations may provide too much freedom in the sense that a system may then define too many 
weighted relations (under requirements such as the minimisation of efe) to specify a single consistent interpretation of the 
states of the system. Hence, restricting the theory to the base branching structure may be desirable (or perhaps at least to some 
further generalisation of the base branching structure). In spite of this we will now look at the affect of combining nodes and 
splitting nodes on efe-histograms. 


3.1.3 Base and efe-histograms 


The following lemma says that uniform randomness is preserved by both combining and splitting operations. 


Lemma 3.1. Suppose we have a system where the probability distribution P in Definition U.lU s uniform over For any 
given combining or splitting operation, as described in Subsection 13.7.71 let S' and V' (with #V' as small as possible) be 
such that Hj/ y/ is the codomain ofQ-sy under the operation. Furthermore, for S'- an element of the image ofPlsy under the 
operation, define P'{S'f) := P{Ay ), where Ay is the preimage ofS'^. Then P' is a uniform probability distribution over Plyy'- 

Proof Immediate since the operation is a bijection from Q.sy to Tlyy'- ^ 

We now consider the case where P is far from uniform over Cls.v- Because computational recourses are limited, a choice 
had to be made between looking at the affects of many different base changing operations on just one system and looking at 
the affects of one or two different base changing operations per system for several different systems. The latter was chosen in 
order to avoid accidentally giving results for some highly unusual system. Typical data was obtained for each of the systems 
from digital photographs. The method in Figure [T] was used except the number of shades in the gray scale, the location of 
the sampling grid and the number of nodes involved varied from system to system; details are given on the left-hand side of 
Table [m 

Table 11; Seven systems from which efe observations were taken both before and after the 
application of a base changing operation. 


System 

#5 

#V 

#efe-observations 

Operation 

#5' 

#V' 

#efe-observations 

1 

6 

2 

400000 

combine 

3 

4 

400000 

2 

3 

4 

5000 

split 

6 

2 

5000 

3 

3 

4 

5000 

split 

6 

2 

5000 

4 

3 

9 

400000 

split 

6 

3 

400000 

5 

4 

9 

100000 

split 

8 

3 

100000 

6 

6 

3 

400000 

combine 

3 

9 

400000 

7 

6 

3 

400000 

combine 

3 

9 

400000 


For each of the systems, 400 typical data elements were collected. Subsequently a large number of efe observations were 
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System 

• 1 

O 2 

■ 3 

+ 4 
A 5 
□ 6 

♦ 7 


Skewness 


Mean-Minimum 


0 * A + 


I -Bt -♦-!-r 

-1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0 
Shift in minimum 


0 0.2 


♦ 




0 0.1 0.2 0.3 0.4 0.5 

Shift in median 


Figure 10: For each system the figure shows: the skewness, and mean minus minimum, of the efe-histogram 
when using the original base (x axis) and after changing to the alternative base (y axis)-, the shift in the 
minimum and the shift in the median when changing back to the original base from the alternative base. 

obtained using the method described in Subsection 12.31 The same number of efe observations was then obtained having 
applied a base changing operation to the typical data. Apart from the size of base (i.e. the size of the repertoire #V’ in 
Table fTTIl. the base changing operation was chosen at random for each system. With respect to the seven systems in Table fTTl 
we note that System 3 is actually the same as System 2 in the sense that the same typical data is used. However, a different 
base changing operation has been applied to System 3 than that applied to System 2. Similarly, System 6 and System 7 are the 
same but have had different base changing operations applied. For each system. Figure [Tolcompares statistics obtained from 
the efe observations, made before the change of base, with statistics obtained from the efe observations made after the change 
of base. Note that, in Figure[Tol skewness is measured using the adjusted Fisher-Pearson standardized moment coefficient. 
For each of the systems investigated it can be seen from Figure [TO] that, when changing back to the original base from the 
alternative base, the efe-histogram undergoes an increase in negative skewness and mean minus minimum as well as a right 
shift in location. Furthermore, for most of the systems, the minimum efe value observed, when using the original base, is to 
the left of the minimum efe value observed when using the alternative base. These results suggest that the bases maximising 
the length of the left tail of the efe-distribution (here approximated by an efe-histogram) are important for the theory presented 
in the present paper. One caveat concerning this investigation is that, for each system, the variance in the observed minimum 
might be rather high because the distribution being sampled has a very thin left tail. We now move onto our next investigation. 
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3.2 Joining and partitioning systems 


Consider the visual cortex and the auditory cortex of the brain. There is evidence that the brain defines relationships between 
the states of these different brain regions at a high level of meaning (i.e. between images and sounds); see ||6|. However, at the 
lower level of meaning at which the images and the sounds are defined it may be that the two brain regions are self contained. 
The brains of two different people is perhaps a more overt example of self containment or privacy. 

In the context of the theory of the present paper, suppose we have two systems. We can solve O for both of the systems 
separately and sum the resulting minimised efes. If this sum is signihcantly more than the minimum efe obtained when 
joining the two systems then it makes sense to consider the two systems as a single system. Examples of such systems are 
easy to construct. Conversely, for a given system, it might be possible to partition the set of nodes S such that the sum of the 
minimum efes of the resulting systems is less than that of the original system. In this case, at least in the given basis, it makes 
sense to consider the original system as several different systems. It is not so easy to hnd examples of such systems, at least 
when the systems are small. However, Table fT^ provides an example where the minimum efe of the system is greater than 3 
whilst, after partitioning, the sum of the minimum efes is only 2.8. The result was obtained from the system by investigating 

Table 12; Typical data of a system before and after a partition which results in lowering the total 

minimum efe. 



node 1 

node 2 

node 3 

node 4 








Vl 

Vl 

Vl 

Vl 







St{2) 

VI 

Vl 

Vl 

V3 







^r(3) 

Vl 

Vl 

V3 

Vl 







‘St(4) 

Vl 

Vl 

V3 

V3 







Sr{5) 

Vl 

Vl 

V3 

V3 







Sr{6) 

Vl 

V2 

Vl 

Vl 







Sr{7) 

Vl 

V2 

Vl 

V3 







Sr{&) 

Vl 

V2 

V3 

Vl 







^t(9) 

Vl 

V2 

V3 

V3 













node 1 

node 2 


node 3 

node 


^1 

^2 

V3 

^3 

















V2 

Vl 

Vl 

Vl 

5t(i) 

Vl 

Vl 

5"(i) 

Vl 

Vl 

St{l2) 

V2 

Vl 

Vl 

V3 

"r(2) 

Vl 

V2 

""(2) 

Vl 

V3 


V2 

Vl 

V3 

Vl 

‘^1(3) 

V2 

Vl 

S" 

^t(3) 

V3 

Vl 

*^1(14) 

V2 

Vl 

V3 

V3 


V2 

Vl 

S" 

^r(4) 

V3 

V3 

St{\5) 

V2 

Vl 

V3 

V3 

S' 

r(5) 

V2 

V2 

S" 

^r(5) 

V3 

V3 

^t(16) 

V2 

Vl 

Vl 

Vl 








V2 

Vl 

Vl 

V3 








V2 

Vl 

V3 

Vl 







*^7(19) 

V2 

Vl 

V3 

V3 







*^7(20) 

V2 

Vl 

V3 

V3 







*51(21) 

V2 

V2 

Vl 

Vl 







5t(22) 

V2 

V2 

Vl 

V3 







5t(23) 

V2 

V2 

V3 

Vl 







5t(24) 

V2 

V2 

V3 

V3 







5t(25) 

V2 

V2 

V3 

V3 








an efe-histogram involving 4 10® observations, and by running the binary search algorithm. Note that the typical data of the 
system is such that the partitioned systems are independent when considered as random variables (this is why the number of 
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typical data elements can be divided by five after partitioning). 

The number of different partitions that there are of a system with n nodes is given by the Bell number For #S = n we 
have where B{x) = exp(e^ — 1) is the generating function for the Bell number; see Q. This number quickly 

becomes large as n increases; making the investigation of all the different partitions of a system computationally expensive 
for all but small systems. In the final investigation of this section we consider the metric used in Definition ! 1.51 


3.3 Metric independence 


Remark [T3] suggests that the theory presented in the present paper is independent of the choice of metric used in Definition ll.5l 
provided that the metric determines a total order on [0,1] in some natural way. Before considering this in more detail, we 
have the following example. 


Example 3.6. This example uses the typical data T that was obtained in Example 15.71 The binary search algorithm of 
Subsection 12.2! was again applied to T but this time doo was used in Definition 17.5! instead o/di. After four cycles, the 
weighted relations in Table\T^were returned with efe{R,U ,T) = 5.30610 using doo. Bfe see that U in Table\T^yield the same 
Table 13: Approximate solution for Example [T6] using doo in Definition ll.5l 


u 

0 

147.224 

294.449 

441.673 

0 

1 

0.53125 

0.28125 

0.03125 

147.224 

0.53125 

1 

0.65625 

0.34375 

294.449 

0.28125 

0.65625 

1 

0.59375 

441.673 

0.03125 

0.34375 

0.59375 

1 


R 

node 1 

node 2 

node 3 

node 4 

node 5 

node 1 

1 

0.96875 

0.84375 

0.90625 

0.90625 

node 2 

0.96875 

1 

0.84375 

0.90625 

0.96875 

node 3 

0.84375 

0.84375 

1 

0.84375 

0.84375 

node 4 

0.90625 

0.90625 

0.84375 

1 

0.90625 

node 5 

0.90625 

0.96875 

0.84375 

0.90625 

1 


graph illustration as that given by U in Table\^rom Example\3J\ The same cannot be said when comparing R in Table [75l 
with R in Table |5] although there are some similarities. However, it also turns out that U and R in Table |5] are a better 
approximate solution to (0 in the present example, i.e. when using doo instead o/di, than that given by U and R in Table [T3\ 
Indeed the efe drops from 5.30610 to 5.27370 bpe. 

The result in Example 13.6! is perhaps not surprising once we appreciate certain similarities between doo and di. To 
appreciate these similarities and further results, the following assumption will be useful. 

Assumption 3.1. Under this assumption, for a metric d : [0,1]" —>■ K+, there exists a metric d^: [0,1] —>■ K.+ such that for all 
1 <i<n, (ci,• • • • • • ,c„) G [0,1]"^* anda,b € [0,1] we have 


, * * * ,C/—1,C/+1, * *' ; (t^i 1 * * * ; t:/—1,C/-I-1, • • • ,c«)) — d {a,bf 
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Furthermore, there is Omin S [0,1] (e.g. classically = 0) such that <j, given by 

a d (^/min; ^^ d (a^iji ,b), (2) 

is a total order on [0,1] and (up to reverse ordering) no other choice o/Amin S [0,1] in (O gives a different total order. 
Moreover, determines a maximum element flmax G [0,1] (e.g. classically Omax = Ij which, if using d in Dehnition U .5\ and 
< j in the interpretation of weighted relations, would be used in the definition of a reflexive weighted relation in Dehnition U .2\ 

Remark 3.2. AssumDtion \3.1\ sives rise to the following remarks. 

1. Under AssumDtion \3. h it can be argued that d determines a single well defined metric on [0,1] and that this metric is d!. 

2. Furthermore, d! (and hence dj determines intervals in [0,1], i.e. 

[a,b\^ ;= {c G [0,1] : a c foraff G [0,1], 

and the length of such intervals is given by d!(a,b). 

3. With the above two remarks in place, we note that, in each coordinate, d defines d-uniform random variables on [0,1], 
i.e. if [a,^](j and [c,t/](j are of the same length then the probability of a d-uniform random variable taking a value in 
[a,/?]^ is the same as it taking a value in [c,^^]J. 

4. To appreciate some of the similarities between doo and di, note that all of the metrics given in Dehnition 17.41 satisfy 
AssumDtion \3.1\ and that dj {a,b) = \a — b\ = d'^(a,b) for all a,b € [0,1]. There are also some important differences 
between di and doo," we will look at some of these shortly. 

We now return to the issue of metric independence. 

Lemma 3.2. Suppose Dehnition U .5\ uses a metric d that satisfies Assumption \3.1\ and let f : [0,1] —>■ [0,1] be a bijection. 
Then d^ : [0,1]" —>■ M+, 


dfiiai,--- ,an),(bi,--- ,b„)) := d((/(ai), • • • J{an)),{f{b\),-■ ■ ,fib„))), 


is also a metric on [0,1]" and the theory in the present paper is independent of a change of metric from dtodf in Dehnition U .5\ 
provided that, in Dehnition U .2\ /^'(amax) is used in the definition of a reflexive weighted relation, is used in place of 
< j in the interpretation of values given by weighted relations and, when obtaining efe-histograms (see Subsection 12.31 ). 
df-uniform random variables are used instead of d-uniform random variables. 

Proof. Lemma [3[^ follows immediately from the fact that dy is merely d under relabeling each a G [0,1] with (a). □ 
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It is expected that a more general result than Lemma llSl is possible such that d' in Assumption B.ll mav have dependence 
on (ci, • • • • • • ,Cn) € [0,1]"^* for 1 < i < n. In this case, the interpretation of each value in a weighted relation 

table would be dependent on the other values in that table; the interpretation itself is determined by the metric d being used. 
To appreciate one of the differences between di and d«, we require the following definition. 

Definition 3.1. Let d : [0,1]” —>■ R+ be a metric conforming to Assumption 15.71 Moreover, let a,b,c G [0,1]” be such that 
d^(a,-,c,) > d!{bi,Ci), for i = and for one or more i we have d^(a,-,c,) > d!{bi,Ci). If for all such a,b,c € [0,1]" we 

have d(fl ,c) > d{b,c) then d is called an increasing function of coordinatewise distance. 

The metric di is an increasing function of coordinatewise distance but, for n > 1 in Definition E] d„ is not; indeed 
doo(a,c) = 1 = doc{b,c) for « = 2, a = (1,0.5), b = (1,0) and c = (0,0). 

It is hoped that, upon further investigation, a class of metrics will emerge as being the most optimal (in some natural way) in 
the context of the theory of the present paper. Hence, independence arguments would then only need to apply to this class 
of metrics. It may be that being an increasing function of coordinatewise distance is a necessary condition for a metric to be 
optimal, but this is for future work. Regarding the theory in the present paper, it is certainly the case that the meaning of the 
values in weighted relation tables is given by the characteristics of the metric being used in Definition [T3] 

We conclude this section with a reminder of the variety of different metrics that there are on R". Lemma lTSl shows that, even 
when restricting attention to metrics that are equivalent to d 2 , the variety is great. 

Lemma 3.3. Let d 2 be the Euclidean metric on M", n G N. For all a,b £ M”, define d^ [a,b) := d 2 {f{a),f{b)), where 
/:]&"—>■ R" is such that f : (R",d 2 ) ^ (R",d 2 ) is a homeomorphism. Then (R",d'^) is a metric space and d^ is equivalent 
to d 2 ; i.e. the open subsets o/(R",d^) are the same as those o/(R”,d 2 ). 

Proof. Since / ; (R",d 2 ) —>■ (R",d 2 ) is a homeomorphism, / : R" ^ R" is a bijection. From this it easily follows that d^ is a 
metric. To show equivalence we need to show that A C (R",d^) is open if and only if A C (R",d 2 ) is open. Hence, to show 
only if, let A C (R",d^) be open. In this direction it is enough to show that /(A), as a subset of (R",d 2 ), is open since then 
A = /^^(/(A)) C (R",d 2 ) is open by / : (R",d 2 ) —(R",d 2 ) being a homeomorphism. Let a' £ f{A). Then a' = f(a) for 
some a £ A. Since A C (R”,d'^) is open, there exists e > 0 such that for all G R" with d-^{a,b) < e we have b £ A, and 
thus f{b) £ f{A). Hence, for all b' £ R" with dl {a,f^^{b')) < e we have f^^{b') G A, and thus /(/^*(^0) = b' £ f{A). 
Noting that d^(a,/^'(!?')) = d 2 (/(a),/(/^* (/?'))) = d 2 {a',b'), it follows from the last statement that for all b' £ R" with 
d 2 {a',b') < e we have b' G /(A). Hence, /(A) C (R",d 2 ) is open. The proof in the other direction is similar. □ 

Example 3.7. Let / : R^ —>■ R^ conform to the conditions of Lemma \3.3\ If f maps t in Fi 2 ure [Tl\ to the unit circle in (R^, d 2 ) 
and /(O) = 0 then C R^ is the unit circle in (R^,d'^). 

Section |4] provides some generalisations of Definition O a comparison with both Integrated Information Theory and 
Shannon entropy, followed by the conclusion. 
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Figure 11; The path i in Example [TT] 


4 Generalisation, a comparison with IIT theory, and conclusion 

In Subsection l4.1l we extend Definition ll.5l to involve many more weighted relations. 

4.1 Multi-relational float entropy and time dependents 

We start with a definition. 

Definition 4.1 (Multi-relational float entropy). Let S be as in Definition 17.71 let U G 'Ey, and let R G ‘Fj. Furthermore, let 
Ui,U 2 t ■ ■ and Ri,R 2 , ...be weighted relations analogous to U andR but for the system in different bases; see Subsection \3.1.2\ 
on base branching structure. The multi-relational float entropy of a data element Si G Llsy, relative to U ,Ui,U 2 , ■ ■ ■ and 
7?,7?i,7?2, • •., is defined as 

ft{R,U,R,,U,,R2,U2,...,Si) 

log2(#{‘^7 S ^sy ■ Co{R,U,Ri,Ui,R2,U2,..., Si,Sj) ACi{R,U,Ri,Ui,R2,U2,.. .,Si,Sj) A • • •}), 

where the first condition Cq(R,U ,R\,U\,R 2 ,U 2 ,--- ,Si,Sj) is d{R,R{U,Sj}) < d{R,R{U,Si}), as in Definition \l .5\ 

In Definition l4.ll all of the conditions Co,Ci, • • • need to be satisfied for a data element Sj to contribute toward the multi- 
relational float entropy of a data element 5,. The additional conditions should be those that increase the length of the left tail 
of the efe-distribution. 

For example, for some given system (and under the requirement of minimising expected multi-relational float entropy), R 
might be such that it define two dimensional geometry on the nodes of the system. Furthermore, for a particular branch of the 
base branching structure, the states of the composite nodes will be images under the geometry (given by R) on the nodes that 
have been combined. Hence, for Ci analogous to Cq but using the new base, the system defines a weighted relation U\ on 
the repertoire of the composite nodes and hence on a set of images. This may have relevance to some aspects of the Gestalt 
theory of visual perception; see Q. 

Suppose that the system is part of the brain. As suggested in Subsection 13.21 at the level of meaning at which images 
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are defined by the visual cortex and sounds are defined by the auditory cortex, it may be that the two brain regions are 
self contained; i.e. they may be separate systems. However, at a higher level of meaning (and for a particular branch of 
the base branching structure), one of the nodes of the brain will be the whole visual cortex and another will be the whole 
auditory cortex. The states of the visual cortex are visual objects and the states of the auditory cortex are sounds. Applying 
the present theory appropriately should give a weighted relation on the two cortical regions and another giving relationship 
values between objects and sounds. One caveat, however, is that in this case the two cortical regions as nodes do not share the 
same node repertoire, and so some care needs to be taken when considering how to apply the definitions of the present paper. 
There is also evidence of sparse coding in various cortical regions; see 0 and 0. For example, there are neurons that are 
active if and only if activity related to a specific object (auditory or visual etc) is present in the respective cortex. Hence, 
under the minimisation of efe also on this set of neurons, the system defines relationships between objects. 

Finally, Definition ! l.Sl allows time dependent versions of the results presented in the present paper, and in general. Suppose in 
Example B.ll that the digital photographs sampled are in fact frames from videos. Choose an integer k G N. For each sampled 
frame, sample in the same way the subsequent k—\ frames so that the number of nodes of the system has increase by a factor 
of k (i.e. each node in each typical data element is replaced by k nodes that form a sequence of states of the original node 
over a short time period). In this case, it is anticipated that if U and R solve ([T]i then R will define geometry on the nodes of 
the system that has a dimension for time. 

4.2 A comparison with Integrated Information Theory and Shannon entropy 

This subsection starts with an initial comparison between the theory of the present paper and Giulio Tononi’s Integrated 
Information Theory (IIT) of consciousness. IIT has gained much attention in recent years (see cni, im, m, in and hd, 
and it maybe that the two theories are quite compatible in some areas. There is a significant difference in emphasis in the 
formulation of the two theories. In ifTTI IIT has been formulated and further developed with the intention that it will satisfy 
certain self-evident truths about consciousness, which Tononi refers to as axioms. In brief, the axioms are as follows: 

• Existence: Consciousness exists. 

• Composition: Within the same experience, one can see, for example, left and right, red and green, a circle and a square, 
a red circle on the left, a green square on the right, and so on. 

• Information: Consciousness is informative: each experience differs in its particular way from an immense number of 
other possible experiences. 

• Integration: Each experience is irreducible to independent components. 

• Exclusion: At any given time there is only one experience having its full content. This axiom also states constraints on 
consciousness such as resolution. 
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From these axioms IIT postulates a number of properties that physical systems must satisfy in order to generate consciousness. 
These properties introduce a substantial amount of initial theory involving cause and effect within systems. Since this initial 
theory is fundamental to the formulation of IIT, it is crucial that the set of axioms is correct and complete. The theory of the 
present paper has a signihcant difference in emphasis because it uses the following axiom in its formulation; 

• Relations: Consciousness is awash with underlying relationships which provide the relational content of experience. 

It is natural that relations should be fundamental to the formulation of a theory of consciousness because, in one form or 
another, they are ubiquitous among mathematical structures. Hence, in the author’s opinion, this axiom should be added 
to Tononi’s list of axioms. However, IIT does have something to say about the quality of conscious experience, and this is 
discussed below. 

It is worth noting that the theory in the present paper is more or less compatible with the IIT axioms. For example, regarding 
the unity of consciousness (integration), according to the theory in the present paper, when a brain state is interpreted in the 
context of the weighted relations that minimise expected multi-relational float entropy, the brain state acquires meaning in the 
form of the relational content of the associated experience. Furthermore, regarding resolution (part of the exclusion axiom) 
we recall that, for all but trivial examples, O will have many solutions and, hence, only defines weighted relations up to a 
certain resolution that depends on the system. Of course a more rigorous comparison with the axioms is desirable, but this is 
for future work. 

4.2.1 Mechanisms contributing to consciousness 

One of the strengths of IIT is that it attempts to distinguish between brain regions that contribute toward consciousness and 
those that do not. This is undertaken at several different scales from small mechanisms (i.e. small subsystems) up to whole 
systems such as the brain. For this purpose, at the scale of mechanisms, the theory introduces a quantity called Integrated 
Information, and analogous quantities are introduced for larger scales. It is worth giving the reader some insight into how 
this quantity is defined for mechanisms. Suppose we have a small number of logic gates that are interconnected in some way, 
and that the resulting mechanism updates over discrete time. The current state of the mechanism (say at time f = 0) provides 
causal information about what the state of the mechanism might have been at time f = — 1. In fact it implies a probability 
distribution on the set of all states of the mechanism for f = — 1. If we were to partition the mechanism in some way by cutting 
connections and treating cut inputs to gates as extrinsic noise then, in many cases, there would be a reduction in the amount 
of causal information that the current state of the mechanism provides about what the state of the mechanism might have been 
at f = — 1. As in the case of the unpartitioned mechanism, the partitioned mechanism also implies a probability distribution 
on the set of all states of the mechanism for f = — 1. The reduction in the causal information is quantified by measuring the 
distance between these two probability distributions using the Wasserstein metric, also known as the earth-mover’s distance. 
If, out of all the different ways to partition the mechanism, the partition chosen actually loses the minimum amount of causal 
information then the partition is called the minimum information partition (MIP) for the mechanism in its given state at f = 0. 
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In IIT, the quantity of causal information of a mechanism in its given state is defined as the Wasserstein distance between 
the probability distribution for the state of the unpartitioned mechanism, at f = — 1, and the probability distribution for the 
state of the mechanism’s MIP at f = — 1. In IIT there is also an analogous definition for the quantity of effect information of 
a mechanism in its given state. Finally, the quantity of integrated information of a mechanism in its given state is defined as 
the minimum of its causal information and its effect information for that state. The integration postulate of IIT says that only 
when the quantity of integrated information is positive can a mechanism contribute to consciousness. 

We will now consider an example from na which formed part of the motivation behind the definition of integrated informa¬ 
tion. We will see that there is an alternative (or complimentary) interpretation of the example which leads in the direction of 
the theory of the present paper. Consider a digital-camera sensor chip made up of 1 million photodiodes. From the perspective 
of an external observer, the chip has a large number of different states. From an intrinsic perspective, however, the chip can 
be considered as 1 million independent photodiodes; cutting the chip down into individual photodiodes would not change the 
performance of the camera. It is hard to imagine that the chip can be conscious of the images that fall upon it. On the other 
hand, the visual experiences we enjoy are integrated and we experience whole images. Accordingly, cutting the visual cortex 
down into individual neurones would completely change the performance of the system. 

It is then stated in the example that what underlies the unity of experience is a multitude of causal interactions among the 
relevant parts of the brain. From this we can see why cause and effect is a fundamental part of the definitions used in IIT, 
and why the theory developed in the direction it did. An alternative (or complimentary) interpretation of the example is that 
the interactions between neurons make some states of the system more likely than other states; i.e. the system is inherently 
biased and this defines a probability distribution P on the set of states of the system. The probability distribution is a property 
of the system itself and allows the system to define expected quantities. This allows the theory of the present paper to be 
developed with an emphasis on relations, which is desirable since relationships are an inherent part of consciousness. 

Now let’s consider the camera chip in the context of the theory of the present paper. Each photodiode is unbiased since its 
state is driven by its input signal. The 1 million photodiodes are completely independent. If the chip defines a probability 
distribution on its states at all (which is debatable) then it is the uniform distribution. In Examples 13. 3l and l3.4l of the present 
paper, we saw that when P is uniform the solutions to O vary greatly and, hence, the system fails to define weighted relations 
that give a coherent interpretation of the states of the system. Eurthermore, the associated efe-histogram is without a left tail. 
So, for contrast with IIT, the theory of the present paper suggests that, to contribute to consciousness, a mechanism will at 
least need an inherent probability distribution on its set of states that gives an efe-histogram with a long left tail. The length of 
the left tail may turn out to be of great importance; when the tail is very long, the solutions to ([Til are very distinct from other 
weighted relations. The length of the left tail is also important in multi-relational float entropy regarding which branches of 
the base branching structure should be involved; see Subsection l4.ll 

Erom a practical perspective, we might use cause and effect to estimate the inherent probability distribution of a mechanism. 
Eor a deterministic mechanism, we can estimate the probability of a state Si as the number of states that cause 5, divided 
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by the total number of states of the mechanism. Of course, Markov Chain theory is appropriate here (particularly in the 
nondeterministic case) and a rigorous approach should be taken. 

4.2.2 The quality of consciousness 

Suppose we have a mechanism that has n states. In IIT (see ifT^ ). the Qualia space of the mechanism is an «-dimensional 
space with a real axis given for each state of the mechanism. Each probability distribution on the set of states of the mechanism 
defines a point in an (n — 1 )-dimensional subspace of Q-space, noting that, for each probability distribution, probabilities 
must sum to 1 . The point closest to the origin in this subspace is given by the uniform distribution. For a given state of the 
mechanism at f = 0, the state defines a probability distribution on the set of states of the mechanism at f = — 1 and, hence, 
defines a point in Q-space. Similarly, for the given state, each partitioned version of the mechanism (i.e. where only a subset 
of the set of connections of the mechanism is present) also dehnes a point in Q-space. In IIT, some of these points in Q- 
space are joined by q-arrows', the connections of the mechanism involved in determining the point at the bottom of a q-atTow 
are included in the subset of connections involved in determining the point at the top of the q-arrow. This forms a lattice, 
embedded in Q-space, which has the uniform distribution at its bottom and the distribution given by the complete mechanism 
at its top. The shape that the lattice encloses is called the quale, and the q-arrows are a geometric realization of information 
relationships. 

Changing the state of the mechanism at f = 0 will, usually, change the shape of the lattice embedded in Q-space. According to 
IIT, the shape completely specifies the quality of the experience, and it is suggested in ifT^ that similarity in shape corresponds 
to similarity in experience. The theory in ifT^ also suggests a way in which relationships, giving the geometry of monocular 
vision, might be defined in Q-space, although the theory has not been developed in a way that prioritises a capability for 
defining relationships. The property involved concerns q-arrows and is referred to, in ITSll . as entanglement. Suppose a lattice 
in Q-space has a point pi that is at the bottom of two q-atTows q\ 2 and q\ 2 which terminate at points p2 and 773 respectively. 
The connections of the mechanism involved in determining the points p2 and p^, when taken together, determine a point 774. 
Treating 771 and 774 as vector from the origin, if 774 ^ p\ + qip. + ^1,3 then the q-arrows q\ 2 and q'l 3 are said to be tangled. 
In other words, the information relationship given by ^14 does not reduce to the information relationships given by q\ 2 and 
? 1 , 3 - 

With respect to vision, it is suggested in IfT^ that, for a mechanism in the form of a grid, connections of the mechanism that 
are close together will give entangled q-arrows in Q-space near the bottom of the lattice, but connections of the mechanism 
that are far apart will not. Hence, these entanglements give rise to the concept of local regions and, therefore, geometry. 
From the perspective of the author of the present paper, entanglement is a desirable addition to the theory of integrated 
information since it acknowledges the need for the theory to include the capacity to define relationships. For comparison 
regarding the quality of consciousness, the aim of the present paper is to provide a mathematical theory for how the brain 
defines the relationships underlying consciousness. If applicable to the visual cortex, the examples in Section[ 3 ]show that the 
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perceived relationships between different colours, the perceived relationship between different brightnesses, and the perceived 


relationship between different points in a person’s field of view (giving geometry) are all defined by the brain in a mutually 
dependent way. If we were to apply the theory to the auditory cortex then the resulting weighted relations might define how 


we perceive the relationships between the pitches of the chromatic scale. Of course, more work is required. Although an 
early example, when considering the scope of the theory of the present paper, readers may find the Definitive Player Problem 
to be of interest; see m. In short, when IIT leans in the direction of defining relationships synergies start to emerge with the 
theory of the present paper. 

4.2.3 A comparison of float entropy and Shannon entropy 

Shannon entropy is notably used in the neuroscience of consciousness. The definition of float entropy (see Definitions 11.51 
and l4.11 i has some similarity to that of Boltzmann’s entropy. Whilst not to be confused with Shannon entropy, expected float 
entropy, efe, does have some similarities with Shannon entropy. Indeed, efe is a measure (in bits per data element) of the 
expected amount of information needed, to specify the state of the system, beyond what is already known about the system 
from the weighted relations provided. Shannon entropy is a measure of information content in data. As data becomes more 
random. Shannon entropy increases because structure in data is actually a form of redundancy. By solving O for a given 
system we obtain a structure in the form of weighted relations defined by the system. Relative to these weighted relations, if 
the system was to become more random then the efe value for the system would increase. In order to make the similarities 
between efe and Shannon entropy clearer, consider the summation 



( 3 ) 


where A^. := {Sj G ^s.v- d(R,R{t/,5j}) < d{R,R{U,Si})}. The summation in Q is similar in form to the definition of 
Shannon entropy. Furthermore, @ can be written as 



( 4 ) 


and, when the probabilities in the argument of the logarithm are comparable, this will give a value similar to efe{R,U,P). 
Finally, we can write (|4|l as 



( 5 ) 


where H is the Shannon entropy of the system and, with consideration of the log function, the second term has a negative 
value between —H and 0. As per Example 13.41 even when P is uniform over Clsy, the second term of (|5]l need not be equal 
to 0. However, for U and R the constant functions which everywhere take the value 1, (|5]l simplifies to H. 
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4.3 Conclusion 


The present paper significantly extends the work introduced in HI by further developing theory and testing theory using 
several informative examples. We have noted the following two facts. Firstly, conscious experience is awash with underlying 
relationships. Secondly, for various brain regions, such as the visual cortex, the probability distribution over the different 
possible states of the system is far from being uniform owing to the effect of learning rules that weaken or strengthen 
synapses. Hebb’s principle says that what fires together wires together and the BCM version of Hebbian theory is one 
of many such learning paradigms; see 12 and 0. There is also evidence for the relevance of BCM theory regarding the 
hippocampus; see a. Furthermore, the probability distribution over the states of the system is a property of the system itself 
allowing the system to define expected quantities. The theory in the present paper provides a link between the above facts. 
Under the requirement of minimising expected (multi-relational) float entropy, the brain defines relationships; the theory 
represents relationships using weighted relations. It is proposed that when a brain state is interpreted in the context of all 
these weighted relations, defined by the brain, the brain state acquires meaning in the form of the relational content of the 
associated experience. The examples in the present paper provide evidence that supports the theory. 

In Example 13. II T was obtained from digital photographs having a four shade gray scale. In this case, T has defined the 
correct relationships under the requirement that efe is minimised. Similarly, in Example 13.51 T was obtained from digital 
photographs having a nine colour red/green palette. We note that, given the system involved, R and U in this example also 
appear to be favorable weighted relations, and appropriate as approximate solutions to ([TJ. However, in this case R and U 
were guessed and judged appropriate from the efe-histogram; the actual solutions to ([TJ could be somewhat different. 

The results in these examples suggest that the perceived relationships between different colours, the perceived relationships 
between different brightnesses, and the perceived relationships between different points in a person’s field of view (giving 
geometry) are all defined by the brain in a mutually dependent way. Hence, in this case, there is a connection between the 
relationships that underly colour perception and our perception of the underlying geometry of the world around us. 

If we were to apply the theory to the auditory cortex then the resulting weighted relations might define how we perceive the 
relationships between the pitches of the chromatic scale. Of course, more work is required. Although an early example, when 
considering the scope of the theory, readers may find the Definitive Player Problem to be of interest; see m. 

In Examr)le l3.4l we applied the theory to a system where the probability distribution P in Definition !l.ll is uniform over Cls.v- 
In this case the solutions to ([TJ vary greatly (instead of all being similar) and, hence, the system fails to define weighted 
relations that give a coherent interpretation of the states of the system. We found that the variation in the solutions to (HJ is 
partly due to a type of symmetry within T ; this is discussed in Example 13.41 Also, the associated efe-histogram is without a 
left tail. This example supports the claim that the theory may satisfies the empirical observation that not all systems appear 
to be capable of consciousness. 

In Subsection l3.1.3l we investigated the effect of applying base changing operations. Typical data was obtained for seven 
systems from digital photographs. Eor each of the systems investigated, Eigure [TOj shows that, when changing back to 
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the original base from the alternative base, the efe-histogram undergoes an increase in negative skewness and mean minus 
minimum as well as a right shift in location. Furthermore, for most of the systems, the minimum efe value observed, when 
using the original base, is to the left of the minimum efe value observed when using the alternative base. These results 
suggest that the bases maximising the length of the left tail of the efe-distribution (here approximated by an efe-histogram) 
are important for the theory presented in the present paper. However, instead of permitting all base changing operations, 
restricting the theory to the base branching structure may be necessary; see Subsection 13. 1.21 

It is argued in m that the theory presented there provides a solution to the binding problem and avoids the homunculus 
fallacy. Those arguments also apply to the theory presented in the present paper. In particular, consciousness is not the output 
of some algorithmic process but it may instead, largely, be the states of the system interpreted in the context of the weighted 
relations that minimise expected multi-relational float entropy; see Definition 14. II The weighted relations that Definition 14. II 
involves, in addition to U and R, are brought in to play by increasing the number of conditions in Definition 1 1.5 1 The extra 
conditions utilise higher bases of the base branching structure. The findings of the present paper suggest that the conditions 
Co,Ci, • • • should be those that increase the length of the left tail of the efe-distribution. 

In Subsection l3.2l we investigated joining and partitioning systems. Table [12] provides an example where the minimum efe 
of the system is greater than 3 whilst, after partitioning, the sum of the minimum efes is only 2.8. 

In Subsection 13.31 we considered whether the theory presented in the present paper is independent of the choice of metric 
used in Definition 11.51 when the metric determines a total order on [0,1] in some natural way. In this case, the meaning of 
the values in weighted relation tables is determined by the metric being used. Example 13.61 and Lemma [3^ provide some 
evidence of such independence. However, some more work is required. 

Finally, in Subsection l4.2l we made some comparisons between the theory of the present paper. Integrated Information Theory, 
and Shannon entropy. The integration postulate of IIT says that only when the quantity of integrated information is positive 
can a mechanism contribute to consciousness. For comparison, the theory of the present paper suggests that, to contribute 
to consciousness, a mechanism will at least need an inherent probability distribution on its set of states that gives an efe- 
histogram with a long left tail. The length of the left tail may turn out to be of great importance. 

According to IIT, the shape of a quale in Q-space completely specifies the quality of the experience, and it is suggested in lIT^ 
that similarity in shape corresponds to similarity in experience. The theory in lIT^ also suggests a way in which relationships 
might be defined in Q-space by entangled q-arrows. For comparison, the theory of the present paper suggests that, under 
the requirement of minimising expected (multi-relational) float entropy, the brain defines relationships (represented in the 
theory by weighted relations) such that when a brain state is interpreted in the context of all these relationships the brain state 
acquires meaning in the form of the relational content of the associated experience. 

Finally in Subsection l4.2.3l we showed that efe is a measure (in bits per data element) of the expected amount of information 
needed, to specify the state of the system, beyond what is already known about the system from the weighted relations 
provided. 
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It is hoped that future research will someday determine the extent to which the word ‘quasi’ can be removed from the title of 
the present paper. Whilst rather different in content, readers may also find ifTSl . |[T6l . ifTTl and lITSl to be of interest. 


A Software 


Table 14: Software used during the research for the present paper. 


Software Availability 

GIMP 2.6 Freeware 

RasterSampler 1.0 (Java) From the author 

URFinder 3.7 (Java) From the author 

Excel 2007 Microsoft 

Minitab 17 Minitab Inc 


Use in the present paper 

Used to posterise digital raster images, i.e. reduce the palette size to a small 
number of shades or colours. 

Used to sample pixels and collate data. 

Used to search for solutions to O and collect observations for efe-histogram. 
Used to generate binary entry tables (such as those in Table|4l(, store outputs 
and perform statistical analysis. 

Statistical analysis. 


URFinder 3.7 can be used to implement the binary search algorithm, specified in Subsection 12.21 and for collecting 
observations from which efe-histograms can be produced. The author ran URFinder 3.7 on a desktop dual-core CPU machine, 
and is happy to distribute the software. The algorithm and machine were chosen for convenience and their performance (i.e. 
the maximum size of system that can practically be investigated) is far from what could potentially be achieved. Indeed, for a 
system with n = #S and m = #V, Step 2 of the binary search algorithm calculates exact efe values. This is 

computationally expensive for all but quite small systems, particularly since the algorithm calculates exact efe values rather 
than estimates obtained by employing statistical methods. 

For future investigations we could consider taking advantage of the continuing increase in power and affordability of multi- 
GPU machines and hybrid CPU-GPU machines. The use of GPUs can result in orders of magnitude improvement in speed 
over conventional processors. Furthermore, ([TJ is an optimisation problem and falls within a common general class of 
problems studied in optimisation theory for which a number of efficient algorithms are available. These involve, gradient 
methods, stochastic gradient methods and derivative free optimisation; see m, iioi and mi. 


B Notation 


Table 15: Notation (most of the formal definitions can be found in Subsections ll.lll33] and l4n i. 


Symbol 
a,b,c,... 

A 

Bn 

B{x) = exp(fr* — 1) 
Co,Ci,C2,... 

d 


Description 

elements of 5 but also used to denote elements of other sets when the meaning is clear from the 
context. 

an element of 2^^ ''. 

the Bell number for #5 = n. 

the generating function of B„. 

conditions, involving weighted relations, in the definition of multi-relational float entropy, 
a metric on the set of all weighted relations on S or, in places, a metric on [0,1]". 
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d/ 

d/ 

h-ld 

efe{R,U,P) 

efe{R,U,T) 

HRM,Si) 

HR,U,Ri,UuR2,U2,...,Si) 

fi 

node l,node 2,node 3,... 

P 

R 

R{V,Si} 

S 

Si 

T 


U 

Vl,V2,V3,... 

V 

'i’s 

^sy 

2^s,v 


for n G N U {°o}, a metric (on the set of all weighted relations on 5) obtained from the corresponding 
p-norm, for p = n, on a finite dimensional vector space, 
a metric on R"; a function / : R" —>■ R" is used in its definition, 
a metric on [0,1]"; a function / : [0,1] —[0,1] is used in its definition, 
a total order on [0,1] determined by the metric d. 
an interval determined by the metric d. 

the expected float entropy, relative to U and R, of the given system. 

the mean approximation of efe(R,U,P). 

the float entropy, relative to U and R, of the data element 5,-. 

the multi-relational float entropy, relative to U,Ui,U 2 , ■ ■ ■ and R,Ri,R 2 , ■ ■of the data element 5;. 
the map fj'.S—t-V corresponding to the data element 5,-. 
elements of S. 

the probability distribution P : flsy ^ [0,1] of the random variable defined by the bias of the given 
system. P extends to a probability measure on 2^* ''. 
an element of 'f'j. 

the element of given by the canonical definition R{U,Si} := U{fi{a),fi{b)) for all a,b E S. 

a nonempty finite set; in most places S denotes the set of nodes of a system. 

a data element for S, i.e. a system state given by the aggregate of the node states. 

the typical data for the given system, i.e. T is a finite set of numbered observations of the given 

system. 

the map T : {1,... ,#r} —)• {f: 5,- G fis.v} for which is the value of observation number k in T. 
T need not be injective, 
an element of 'i'v ■ 
elements of V. 

the node repertoire, i.e. the set of node states for a given system, 
the set of all reflexive, symmetric weighted-relations on S. 
the set of all reflexive, symmetric weighted-relations on V. 
the set of all data elements 5;, given 5 and V. 
the power set of kls.v ■ 
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